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Abstract This research explores the potential utility of re- 
sponse latency as an index of question problems in survey re- 
search. The time respondents took to answer three types of bad 
questions was compared to the time they took to answer the 
repaired versions of the questions. Questions containing a super- 
fluous negative and double-barreled questions took longer to an- 
swer than nearly identical questions without these problems. Re- 
paired versions of questions soliciting frequency estimates, 
however, took longer to answer than their problematic counter- 
parts. The results are discussed in the context of a model of 
question answering, and their implications for survey methodol- 
ogy are explored. 


Survey methodologists have expressed considerable interest recently 
in techniques for screening survey questions with the aim of repairing 
bad questions before presenting them to large numbers of respondents 
(see Presser and Blair 1994). A number of approaches for the early 
identification of question problems have been explored. Observational 
monitoring focuses on the interaction between the interviewer and 
respondent and relies on a behavior-coding scheme to identify prob- 
lems (e.g., Fowler and Cannell 1996). The cognitive interview is a 
method for gathering detailed information from respondents about the 
processes involved in the formulation of responses and involves exten- 
sive probing, either during the interview or immediately after it (e.g., 
Jobe, Tourangeau, and Smith 1993). Analysis of the verbal output 
based on ‘‘think aloud’’ protocols obtained from respondents retro- 
spectively or while they answer questions has also been implemented, 
sometimes with automatic coding of the protocols (e.g., Bolton 1993). 
In addition, methods for coding the questionnaire itself have also been 
developed (e.g., Lessler and Forsyth 1996). 


JOHN N. BASSILI is professor of psychology at the Scarborough campus of the University 
of Toronto. B. STACEY SCOTT is an undergraduate student at that campus. This research 
was supported by Social Sciences and Humanities Research Council of Canada grant 
40-94-0170. 


Public Opinion Quarterty Votume 60:390-399 © 1996 by the American Association for Public Opinion Research 
All rights reserved. 0033-362.X/96/6003-C003822.50 


VvIOZ ‘ET oun UO eUOZITY Jo ATISIOATUY) We /S.10's;euINo[pIoyxo'bod//:dyy wo pepeojumog 


Response Latency and Question Problems 391 


The present research explores the utility of a cognitive index of 
information processing in the identification of question problems. The 
index is response latency, a measure that is increasingly simple to 
obtain in survey research with the widespread use of computers. For 
example, a previous work (Bassili 1996) discusses a methodology that 
allows accurate response latency measurement in CATI surveys that 
can be implemented more economically than the question-screening 
methods just reviewed. 

Response latency is a general measure of the amount of information 
processing necessary to answer a question. The type of information 
processing indexed by response latency is diverse. To appreciate this, 
it is useful to examine a popular model of the steps involved in answer- 
ing a question (Strack and Martin 1987; Tourangeau 1987; Tourangeau 
and Rasinski 1988). According to this model, question answering in- 
volves four distinguishable steps: question interpretation, memory re- 
trieval, information integration, and response selection. Response la- 
tency is sensitive to information processing at each one of these steps 
(see Bassili 1996). Because question-answering problems can arise 
from difficulties at each of the four steps of the model, response latency 
provides a fittingly broad index of information-processing demands. 
The working assumption we adopt in this research is that question 
problems tend to slow responses because the resolution of the problem 
requires processing time. The specific aim of the research, therefore, 
is to test whether bad questions take longer to answer than good ques- 
tions. To the extent that this is the case, response latency has the 
potential for signaling question problems. 

The research explores response latency for three types of bad ques- 
tions: questions containing a superfluous negative, questions con- 
taining a reference to two distinct themes (i.e., double-barreled ques- 
tions), and questions that were shown by behavioral research to elicit 
high levels of problem behaviors (Fowler 1992). A ‘‘repaired’’ version 
of each question was prepared (the double-barreled questions had two 
repaired versions, one focusing on each theme) and, although the order 
of questions in the survey was fixed, the version of each question 
presented to a respondent was determined randomly. This approach 
allowed us to examine experimentally the impact of known question 
problems on response latency. 


The Study 


Two interviewers conducted the field work by telephone in the spring 
of 1994 from a laboratory facility at an urban university. A random 
sample of 289 valid student telephone numbers was drawn from the 
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university's registration records, from which 200 successful comple- 
tions were secured, representing a response rate of 69.2 percent. 

The questionnaire consisted of 26 opinion questions, three of which, 
in their bad form, contained a superfluous negative and six of which 
made reference to two distinct themes. These questions were answered 
by a simple expression of agreement or disagreement with the assertion 
contained in the item. Four questions that behavioral research (Fowler 
1992) has shown to be problematic were also included. Of these, one 
was answered by ‘‘Yes’’ or ‘‘No,’’ and the rest required frequency 
estimates. Each opinion question had a ‘‘good’’ form that did not con- 
tain a superfluous negative or that referred to only one of the two 
themes touched on by a double-barreled question. The good form of 
the four questions derived from behavioral research consisted of the 
repaired versions developed by Fowler (1992). 

Response latency was measured by the interviewer by pressing the 
space bar of the computer keyboard upon finishing the delivery of the 
question and again as soon as the respondent gave an answer. Mea- 
sured response latencies, therefore, do not include the time taken by 
the interviewer to read the question. Response latencies in which re- 
spondents asked for any type of information before answering were 
coded as invalid by the interviewer and were treated as missing data 
(for details on this procedure, see Bassili 1996). The coding of the 
validity of the response latency measure was followed by a secondary 
code tracking three behaviors: whether a request for repetition or clari- 
fication was made and whether, in the case of double-barreled ques- 
tions, the respondent asked which aspect of the question should be 
answered. Because these problem behaviors involve questions to the 
interviewer, they always resulted in the invalidation of the correspond- 
ing response latencies. 


The Findings 


Latency invalidation occurred on an average of about 11 percent of 
the cases across questions. In addition, ‘‘Don’t knows’’ were also 
excluded, and to minimize the impact of outliers in our analyses, laten- 
cies were truncated at two standard deviations above the mean.! Index 


1. “Don’t knows’’ averaged 3.8 percent and ranged from 0 percent to 11.5 percent, the 
latter figure coming from the double-barreled question on aboriginal rights. Because 
‘Don’t knows” were more frequent for one of the single-barreled forms of this question 
than for the double-barreled form, theme multiplicity does not appear to be responsible 
for this high figure. There were no refusals to answer among the present responses. 
Note also that to reduce the skewness of distributions of response latency scores, loga- 
rithmic or reciprocal transformations are often used. The reciprocal transformation of 
the present scores did not alter the substantive findings presented here. 
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Table 1. Mean Response Latencies and Proportion of Problem 
Behaviors for Questions with Superfluous Negatives 
and Their Affirmative Versions 


Repeat or Clarify 
Question Form Mean Latency (%) 


Negative: More should be done by 

businesses to reduce inequality in 

the workplace. 4.9* 6.3 
Affirmative: More should be done by 

businesses to increase equality in 


the workplace. 4.8" 2.9 
Negative: Policies that do not safe- 

guard the environment are bad. 3.8" 15.7 
Affirmative: Policies that safeguard 

the environment are good. 3.3> 4.1 
Negative: Canada does not do 

enough to reduce pollution. 3.6* 4.1 
Affirmative: Canada should do more 

to reduce pollution. 3.15 1.0 


Note.—Latencies are in seconds, rounded off to one decimal point. Means within a 
question type that do not share superscripts differ from each other with p < .001. 
Problem behaviors were coded by the interviewer after coding the validity of the re- 
sponse latency measure. 


scores were prepared for each type of question by averaging response 
latencies, in standard scores, for the bad and the good forms of the 
questions, respectively. 


SUPERFLUOUS NEGATIVES 


A t-test for paired samples revealed that, on the average, questions 
took longer to answer when they contained a superfluous negative (M 
= 4.1 seconds) than when they did not (M = 3.8 seconds), #(139) = 
5.13, p < .001. The differences were significant for two of the three 
questions (see table 1). 

On the average, questions with a superfluous negative prompted 
requests for repetition or clarification 8.7 percent of the time, whereas 
affirmative questions did so only 2.7 percent of the time (see table 1). 
The slowing effect of superfluous negatives is thus consistent with 
expected behavioral consequences of problem questions. 
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DOUBLE BARRELS 


A one-way repeated-measure ANOVA showed that the average re- 
sponse latencies for double-barreled items (M = 7.7 seconds) were 
longer than for questions containing only one theme (M = 5.2 seconds 
for the first theme and M = 6.7 for the second theme), F = 15.58, 
df = 2,244, p < .001. All six questions took significantly longer to 
answer when they contained two themes than when either of their 
themes was presented alone (see table 2). Theme multiplicity thus 
appears to slow responses. 

On the average, respondents asked for clarification or repetition of 
the question, or for guidance on which ‘“‘barrel’’ to answer, in 13.5 
percent of the cases for double-barreled questions and in 11.4 percent 
of the cases for single-barreled questions (see table 2). 


BEHAVIORAL QUESTIONS 


The average response latency for the four questions that past behav- 
ioral research had shown to engender a high incidence of problem 
behaviors was compared to the average response latency for their re- 
paired versions. These results contained a surprise. Although a t-test 
for paired samples was highly significant, (153) = 7.35, p < .001, it 
is the repaired versions of these questions that took longer to answer 
than the problematic versions (M = 9.3 seconds vs. M = 7.7 seconds). 
The effect was observed in three of the four questions (see table 3). 

As shown in table 3, the proportion of respondents who asked for 
the question to be repeated or clarified was higher, on the average, for 
the original version of questions (15.9) than for their repaired version 
(13.5). The relatively small difference in average percentages, how- 
ever, hides the fact that in three of the four cases, Fowler’s repaired 
versions were dramatically better than the original versions in minimiz- 
ing requests for repetition and clarification. The one exception, which 
showed a marked reversal in this pattern, was the question about ill- 
ness. Our presentation of this question, however, omitted transitional 
comments that were included by Fowler and that may have been criti- 
cal to the clarity of the question. 


Discussion 


The results from questions containing a superfluous negative or a mul- 
tiplicity of themes provide clear evidence for the slowing effect of 
poor questions on response latency. To appreciate this finding, it is 
important to bear in mind that timing in the present methodology starts 
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at the end of the question and does not include the time spent by the 
interviewer reading the question.’ It appears, instead, that the extra 
time is spent by the respondent resolving matters having to do with 
the referent of the question. For questions that contain a superfluous 
negative, this extra time is most likely spent disentangling the meaning 
of the question. In the context of the four-step question-answering 
model discussed here, this activity is relevant to step 1 (question inter- 
pretation). In the case of double-barreled questions, the extra time is 
probably spent choosing a focus among the two options presented in 
the question and/or integrating feelings toward the two foci into one 
evaluation. These mental activities are thus relevant to the first (ques- 
tion interpretation) and third (information integration) steps of the 
question-answering model. 

The results associated with the four questions derived from research 
involving behavior coding stand in contrast to those just discussed, 
not only because these questions took longer to answer in their re- 
paired than in their problematic form but because response latencies 
were generally much longer than for the other questions in this study. 
We suspect that the pattern of results associated with these questions 
reflects the cognitive demands of frequency estimation (see Felcher 
and Calder 1990). Ironically, clear questions soliciting frequency esti- 
mates may focus respondents better on the demanding memorial 
search required by frequency estimation than questions that solicit the 
task with less clarity. Accordingly, differences in response latencies 
for these questions reflect activity at the second step (memory re- 
trieval) of the question-answering model. 

How can the relationships between question problems and response 
latency documented here be used in screening survey questions? The 
answer, we believe, must consider the broad context of information 
that is always necessary to the interpretation of response latencies. In 
the same way that scores derived from other methods for screening 
questions (e.g., behavior coding, protocol analysis, cognitive inter- 
viewing) require interpretation by comparison to known norms or vari- 
ations in question form, so too do response latency scores. Thus, we 
suspect that researchers would benefit from the development of re- 
sponse latency norms for the types of questions they customarily in- 
clude in their questionnaires. 

It is instructive that many but not all the latency effects documented 


2. Although timing was begun after the interviewer finished reading the question, the 
mere length of the question possibly still affected response latencies. Because affirmative 
and single-barreled questions are generally shorter than their problematic counterparts, 
and because the repaired versions of Fowler’s questions are longer than the original 
versions, it would be advisable for future researchers to attempt to control for question 


length. 


vIOZ ‘E] oun UO eUOZITY Jo ATISIOATUY) We /S.10's;euINo[pIoyxo'bod/:dyy wor peprojumog 


+62 40'S *910W 
SaIpMs I9y} UO SNdoj sjuSpNys doy [IM Ish [OOYIS pepusixo use ZulABH /p apsuis 
r'9l PL *a]qQno7 Jo Ino pue JaaNs 94) YO Wey) dooy pue sou! 
SOIPNYS dy] UO sNd0J syuspNys djay [IM BIA [OOYSS popusjxa Ue BuUIABH -ajqnoq 
Ll SLL *JUSWUISAOT 9y} ISUTEZB 
uonoe aiqeygnsnf & sem Sulpring UMOjUMOP B JO JOZAOdJR} BUISLIOGY 3] 7g ajsuis 
39 ql'€ *soxe) Aed 0} avy JOU pynoys sojdoad jeursuogy -y ajsuis 
6'SI 3£'8 *JUIWUIIAOT dy} isUTese UONB s[qQeyNsnl eB sem SuIpying UMOJUMOP 
B JO JaA094B) Joy) pue ‘soxe) Aed 0} aavy jOU plnoys sojdoad jeurBuogy :ajqnoq 
9° FP qe Sl “epeuEy jo 
Sal 9} pue daqGaNd UZ9Mj9q YLI 9Yy) PUSW 0} BpeURD 1OJ JUBWOdUT SI 4] 7g asus 
61 qb'ST = ‘sanuoung persed 
“\ pue d1uy}a 199}01d 0) WoYe peroads eB ayRW 0} EpeURD JOJ JUBLIOdUIT SI yy sp apsuls 
v6 ef 81 a “BpBUBD JO ISa1 BY} puw SoQand UsEMI9q IJLI 94) PUSW 0} PUB SoNLIOUTU [BID 
5 BI pue stuyje 399}0Id 0} WoYa feloads & ayxBW 0} BpeUBD JOJ JUBLOdUIT SI I] /a;qnog 
=| 
(%) Aauoyey] uea ULIO, UONSENd 
aWay.L/AJUBI[D JO yeaday E 
ia 
sjaLeg s[3uIg OM BIsy], pue suoNsand pajeLreg-ajqnog 10} satouaje] asuodsoy uvop *y SIGUL 


rom http://poq.oxfordjournals.org/ at Unive 


396 


‘ainseaw Aduaze] asuodsay au} JO AYIpITBA By) SUIPOD Joye JOMIIAIOIUI BY AQ Popod om SIOIABYSQ WI[QOlg “$0 > d IIH JOINO Youe 
wo JaygIp sydussiodns areys jou op seq} odA} WONsonb & UNM SURI; “JUIOd [eUNID9p 9UO 0} YO popuNol ‘spuoses UI 37B $919U9}8]— "ALON 


el 


$6 


oT 


sr 
st 
68 


8 
oP 
9°8 
“ul 


31°6 


ql 


el Tl 


o£ 
q6'€ 
oS 9 


29'f 
a6 
20°S 
at'S 


from http://poqg.oxfordjournals.org/ at University of Arizona on June 13, 2014 


‘Tenuapyuos Aynuapt s,juaned sy) dooy pnoys 
10}D0pP dU} ‘aSBISIP SNOBJUOD SNOJOBUEP B WO sIOyNs jusNed B J] -g apsurg 
*ADU93 
-JoW9 YIfeay TeuONeU & SI aJ9y) USM SAnLOYINE YVeay pedo] AyNou ATuo pmnoys 
JO}OOP oy} ‘aseasIP sNorsujUOS ‘snosaZuep & WOY sJagNs juoned & J] -p apsuig 
‘AQUSBIIWS YITesYy FEUONRU B SI s1aY} USYM SONLIOYING YITeoY Peso] AIH 
-OU 0} porinbal oq ATUO pmoys pue penuapyuos AyNuap! s,juaned oy} daoq¥ prnoys 
10}90p 3) ‘aseasIp snolseuOs ‘snoJeZuep B WO sans Juoned vB J] -2jqnogq 
“AWD OUILID-YBIY B se soyTfenb OWUOIOL -g asus 
*sIBaA JO 2]dnod ysed 9y) J9AO PasBaJOUI SeY BWLD JO [OAI] SUL -p asug 
“AWD OWILD-YysIy & se saytpenb ojo 
-O], pue ‘sIBoA JO 9[dNOd 4S] 9Y} J9AO PasBaIOUI SBY DUWLID JO [9Az| SUL -ajqnog 
“UsIPTYs ydope 0} a[Qe 3q JOU pynoys sjuered a[BuIg -g ajsuis 
“UaIP[Iyd Jdope 0} 9148 oq JOU prnoys sa[dnod xas-oures -p ajduig 
‘UdIP[IyD idope 0} ajqe oq 10U pfnoys sjusred a[8uls puw sojdnos xos-owes -ajqnog 
‘31qNoN JO NO pus 
S]oa7]s 94) YO syuapnys dooy djay [fim Woh [OOYSS popus}xe ue Bulasy -g apsurg 


397 


398 John N. Bassili and B. Stacey Scott 


Table 3. Mean Response Latencies for Behavior-Coded Questions 
and Their Repaired Versions 


Repeat or Clarify 
Question Form Mean Latency (%) 


Original: Do you exercise or play 

sports regularly? 12.68 5.0 
Repaired: Do you do any sports or 

hobbies involving physical activi- 

ties, or any exercise, including 

walking, on a regular basis? 17.1% 0.0 
Original: What is the average number 

of days each week you have but- 

ter? 3.8 30.1 
Repaired: Not including margarine, 

what is the average number of 


days each week you have butter? 5.0° 9.7 
Original: What is the number of serv- 

ings of eggs in a typical day? 4.23 11.7 
Repaired: On days you eat eggs, how 

many eggs do you usually have? 3.9 5.6 


Original: During the past 12 months, 

that is, since January 1, 1994, 

about how many days did illness 

keep you in bed for more than half 

of the day? 8.8* 16.7 
Repaired: During the past 12 months, 

since January 1, 1994, on about 

how many days did you spend sev- 

eral extra hours in bed because 

you were sick, injured, or just not 

feeling well? 10.1° 38.8 


Note.—Latencies are in seconds, rounded off to one decimial point. Means within 
a question type that do not share superscripts differ from each other with p < .001. 


here found their counterpart in behavior problems. For example, the 
last item in table 2 takes longer to answer in its double-barreled form 
than when either theme is presented alone, yet the double-barreled 
form produces fewest behavioral problems. This is probably because 
the two themes (reporting health problems to authorities vs. confiden- 
tiality) work together in creating a clear dilemma in the double-barreled 
form of the question, and this dilemma requires time to resolve. The 
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lesson for us is that long response latencies do not always indicate 
question problems. 

Although the present research only begins to explore the relationship 
between response latency and other indexes of question problems, the 
results we have presented suggest that although overlap probably oc- 
curs in the indexing value of these measures, the overlap is not perfect. 
It would be interesting in the future to explore the distinctive contribu- 
tion that response latency measurement can make to the identification 
of particular question problems (Presser and Blair 1994). 

Response latency provides an unusually economical index of ques- 
tion problems, and we believe that researchers will benefit from 
tracking on an ongoing basis how long it takes respondents to answer 
their questions. 
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